Customer Cohort Analysis¶
Cohort analysis is a useful technique to understand customer behavior over time. It involves grouping customers into cohorts based on their first purchase date and then analyzing their behavior over subsequent periods. Below is a Python script using pandas and matplotlib to perform cohort analysis on the provided dataset.
Step 1: Import Libraries¶
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import calendar
import warnings
import json
import os
from datetime import datetime
from itertools import combinations
from collections import Counter
Step 2: Load the Dataset¶
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
Customer_data_df = pd.read_csv(r"C:\Users\jki\Desktop\Data Scence Projects\Customer Segmentaion Cohort Analysis\Machine Learnign Project\Source Data\customer orders.csv", encoding="ISO-8859-1")
Customer_data_df.head(5)
Id | Full Name | Address | age | gender | Latitude | Longitude | Email Adress | Order Datetime | Order Status | Order Total | Items | Total sales | Order Count | Rating | Average Rating | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10 | Adam Martinez | China, Beijing Shi | 30 | Female | 40.0 | 116.0 | adam.martinez@internalmail | 9/14/2021 | CANCELLED | 209 | Boy's Coat (Blue) | 80 | 1 | 8 | 6 |
1 | 20 | Adam Miller | 193, Bannerghatta Main Rd | 68 | Female | 13.0 | 78.0 | adam.miller@internalmail | 9/18/2021 | COMPLETE | 54 | Boy's Coat (Blue) | 10,816 | 90 | 4 | 6 |
2 | 30 | Adam Walker | Behrenstraße 42 | 70 | Female | 53.0 | 13.0 | adam.walker@internalmail | 9/22/2021 | COMPLETE | 43 | Boy's Coat (Blue) | 319 | 3 | 8 | 6 |
3 | 40 | Adan Lamica | Behrenstraße 42 | 72 | Female | 53.0 | 13.0 | adan.lamica@internalmail | 9/26/2021 | COMPLETE | 305 | Boy's Coat (Brown) | 137 | 3 | 5 | 6 |
4 | 50 | Adeline Iannotti | Floreasca Park 43 Soseaua | 16 | Female | 44.0 | 26.0 | adeline.iannotti@internalmail | 9/30/2021 | COMPLETE | 153 | Boy's Coat (Brown) | 3,936 | 64 | 6 | 6 |
Step 3: Convert Order Datetime to Datetime Format¶
# Convert 'Order Datetime' to datetime format
Customer_data_df['Order Datetime'] = pd.to_datetime(Customer_data_df['Order Datetime'], format='%m/%d/%Y')
Step 4: Extract the Cohort (Month of First Purchase)¶
# Extract the cohort (month of first purchase) for each customer
Customer_data_df['Cohort'] = Customer_data_df.groupby('Email Adress')['Order Datetime'].transform('min').dt.to_period('M')
Step 5: Calculate the Cohort Index (Months Since First Purchase)¶
# Calculate the time offset for each order within the cohort
Customer_data_df['Cohort Index'] = (Customer_data_df['Order Datetime'].dt.to_period('M') - Customer_data_df['Cohort']).apply(lambda x: x.n)
Step 6: Group by Cohort and Cohort Index¶
# Group by Cohort and Cohort Index, then count the number of unique customers
cohort_data = Customer_data_df.groupby(['Cohort', 'Cohort Index'])['Email Adress'].nunique().reset_index()
Step 7: Pivot the Data to Create a Cohort Matrix¶
# Pivot the data to create a cohort matrix
cohort_pivot = cohort_data.pivot(index='Cohort', columns='Cohort Index', values='Email Adress')
Step 8: Visualize the Cohort Analysis¶
# Plot the cohort analysis
plt.figure(figsize=(12, 8))
sns.heatmap(cohort_pivot, annot=True, fmt='.0f', cmap='Blues', linewidths=0.5)
plt.title('Cohort Analysis - Customer Retention')
plt.xlabel('Cohort Index (Months since first purchase)')
plt.ylabel('Cohort (Month of first purchase)')
plt.show()
The content you provided appears to be a list of cohorts based on the month of the first purchase, primarily focusing on the years 2022 and 2023. Cohort analysis is a method used to track and analyze the behavior of groups of users over time, often to understand customer retention and engagement.
Here’s a breakdown of the key points:
Cohort Identification: The cohorts are identified by the month of the first purchase. For example, "2022-09" refers to customers who made their first purchase in September 2022.